Connected Component Based Word Spotting on Persian Handwritten image documents

Authors: not saved
Abstract:

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-uments using attribute-based classication and label-embedding. For this purpose, a hierarchicalframework is proposed, in which at rst, the candidate are selected based on connected compo-nents(CCs) sequence. Then, the query word is segmented to constructor CCs, and similar CCs countin the candidate region of document are selected based on their distances to the CCs count of thequery word. As a result, the candidate regions are extracted. In the nal phase, the query wordis located only in the candidate regions of the document. A well known Persian handwritten textdataset, namely FTH, is chosen as a benchmark for the presented method. The results shows thatthe proposed method outperforms the state-of-the-art methods, 81.02 percent for unseen word classretrieval.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Offline Word Spotting in Handwritten Documents

The digitization of written human knowledge into string data has reached up to but not beyond the recognition of typeset text. This means that vast libraries of handwritten, cursive documents must be indexed and transcribed by a human—a prohibitively laborious task. This paper explores an existing technique developed in [1] and [12] for the offline indexation of historical handwritten documents...

full text

Word Spotting in Handwritten Arabic Documents Using Bag-Of-Descriptors

This paper presents a query-by-example word spotting in handwritten Arabic documents, based on Scale Invariant Feature Transform (SIFT), without using any text word or line segmentation approach, because any errors affect to the subsequent word representation. First the interest points are automatically extracted from the images using SIFT detector, then, we use SIFT descriptor to represent eac...

full text

Segmentation-free Word Spotting for Handwritten Arabic Documents

6 Abstract — In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representati...

full text

On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents

Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely...

full text

Query Word Image based Retrieval Scheme for Handwritten Tamil Documents

This paper brings out an autoassociative neural network (AANN) based information retrieval mechanism to locate handwritten documents from a literary collection in Tamil language corresponding to query word images. The strategy extends to create models for the chosen search word images, evolve a methodology to identify the search word and subsequently retrieve the relevant documents. AANN emphas...

full text

Word Spotting: Indexing Handwritten Archives

There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the early Presidential papers at the Library of Congress and the collected works of W. B. DuBois at the library of the University of Massachusetts. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Characte...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 10  issue 2

pages  11- 21

publication date 2019-12-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023